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Abstract 

The cophenetic metrics cZ^p, for p € {0} U [1, oo[, are a recent addition to the 
kit of available distances for the comparison of phylogenetic trees. Based on a 
fifty years old idea of Sokal and Rohlf, these metrics compare phylogenetic trees 
on a same set of taxa by encoding them by means of their vectors of cophenetic 
values of pairs of taxa and depths of single taxa, and then computing the V 
norm of the difference of the corresponding vectors. In this paper we compute 
the expected value of the square of d v ^ on the space of fully resolved rooted 
phylogenetic trees with n leaves, under the Yule and the uniform probability 
distributions. 

Keywords: Phylogenetic tree, Cophenetic metric, Uniform model, Yule model, 
Sackin index, Total cophenetic index 



1. Introduction 

The definition and study of metrics for the comparison of rooted phylogenetic 
trees on the same set of taxa is a classical problem in phylogenetics [TOl Ch. 30] , 
and many metrics have been introduced so far with this purpose. A recent 
addition to the set of metrics available in this context are the cophenetic metrics 
d v , p introduced in j5]. Based on a fifty years old idea of Sokal and Rohlf, these 
metrics compare phylogenetic trees on a same set of taxa by first encoding 
the trees by means of their vectors of cophenetic values of pairs of taxa and 
depths of single taxa, and then computing the L p norm of the difference of the 
corresponding vectors. 

Once the disimilarity between two phylogenetic trees has been computed 
through a given metric, it is convenient in many situations to assess its signifi- 
ance. One possibility is to compare the value obtained with its expected, or 
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mean, value: is it much larger, much smaller, similar? [55] This makes it neces- 
sary to study the distribution of the metric, or, at least, to have a formula for 
the expected value of the metric for any number n of leaves. The distribution 
of several metrics has been studied so far: see, for instance, |H1 EH El HE] • 

The expected value of a distance depends on the probability distribution on 
the space of phylogenetic trees under consideration. The most popular distri- 
bution on the space T n of binary phylogenetic trees with n leaves is the uniform 
distribution, under which all trees in T n are equiprobable. But phylogeneticists 
consider also other probability distributions on T n , defined through stochastic 
models of evolution [TD], Ch. 33] . The most popular is the so-called Yule model 
[H][22]) defined by an evolutionary process where, at each step, each currently 
extant species can give rise, with the same probability, to two new species. Un- 
der this model, different phylogenetic trees with the same number of leaves may 
have different probabilities, which depend on their shape. 

In this paper we provide explicit formulas for the expected values under the 
uniform and the Yule models of the square of the euclidean cophenetic metric 
d v ,2- The proofs of these formulas are based on long and tedious algebraic 
computations and thus, to ease the task of the reader interested only in the 
formulas and the path leading to them, but not in the details, we have moved 
these computations to an Appendix at the end of the paper. 

Besides the aforemenentioned application of this value in the assessment 
of tree comparisons, the knowledge of formulas for the expected value of 2 
under different models may allow the use of d Vi 2 to test stochastic models of tree 
growth, a popular line of research in the last years which so far has been mostly 
based on shape indices; see, for instance, [3] [19]. As a proof of concept, in §4 we 
report on a basic, preliminary such test performed on the binary phylogenetic 
trees contained in the TreeBASE database [2"U] . 

2. Preliminaries 

In this paper, by a phylogenetic tree on a set S of taxa we mean a fully 
resolved, or binary, rooted tree with its leaves bijectively labeled in S. We 
understand such a rooted tree as a directed graph, with its arcs pointing away 
from the root. To simplify the language, we shall always identify a leaf of a 
phylogenetic tree with its label. We shall also use the term phylogenetic tree 
with n leaves to refer to a phylogenetic tree on the set {1, . . . ,n}. We shall 
denote by T(S) the space of all phylogenetic trees on S and by T n the space of 
all phylogenetic trees with n leaves. 

Let T be a phylogenetic tree. If there exists a directed path from u to v in 
T, we shall say that v is a descendant of u and also that u is an ancestor of 
v. The lowest common ancestor LCAt(u,v) of a pair of nodes u,v in T is the 
unique common ancestor of them that is a descendant of every other common 
ancestor of them. The depth 5t(v) of a node v in T is the distance (in number 
of arcs) from the root of T to v. The cophenetic value <pT(hj) of a pair of leaves 
i, j in T is the depth of their LCA. To simplify the notations, we shall often 
write (frih i) to denote the depth Sx(i) of a leaf i. 
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Given two phylogenetic trees T, T" on disjoint sets of taxa S, S' , respectively, 
we shall denote by T^T' the phylogenetic tree on S U S' obtained by connecting 
the roots of T and T' to a (new) common root. Every phylogenetic tree T £ T n 
is obtained as TCT' n _ k , for some 1 < k < n — 1, some subset Sk Q {1, . . . , n} 
with k elements, some tree Tk on Sk and some tree T' n _ k on i?£ = {1, . . . , n}\Sk- 
Actually, every phylogenetic tree in T n is obtained in this way twice. 

The Yule, or Equal-Rate Markov, model of evolution [TU [21] is a stochastic 
model of phylogenetic trees' growth. It starts with a node, and at every step a 
leaf is chosen randomly and uniformly and it is splitted into two leaves. Finally, 
the labels are assigned randomly and uniformly to the leaves once the desired 
number of leaves is reached. This corresponds to a model of evolution where, at 
each step, each currently extant species can give rise, with the same probability, 
to two new species. Under this stochastic model, if T £ T n is a phylogenetic 
tree with set of internal nodes Vi nt (T), and if for every v £ Vi nt (T) we denote 
by £t(v) the number of its descendant leaves, then the probability of T is [H |2"7] 

Py{t) = n i t{v) -i 

The uniform, or Proportional to Distinguishable Arrangements, model [22) is 
another stochastic model of phylogenetic trees' growth. Unlike the Yule model, 
its main feature is that all phylogenetic trees T £ T n have the same probability: 

Pu{T) = * „ , where (2n - 3)!! = (2n - 3)(2n - 5) • • • 3 • 1. 
(2n — 3)!! 

From the point of view of tree growth, this model is described as the process 
that starts with a node labeled 1 and then, at the fc-th step, a new pendant arc, 
ending in the leaf labeled fc + 1, is added either to a new root (whose other child 
will be, then, the original root) or to some edge, with all possible locations of 
this new pendant arc being equiprobable 9, 26J. Although this is not an explicit 
model of evolution, only of tree growth, several interpretations of it in terms of 
evolutionary processes have been given in the literature: see [31 p. 686] and the 
references therein. 



3. Main results 

Let T £ T n be a phylogenetic tree with n leaves. The cophenetic vector of 

^) = (^(«)) I « K ^»" ( " +I)/2 1 

with its elements lexicographically ordered in (i,j). It turns out [S] that the 
mapping <p : T n — >• K. n (" +1 )/ 2 sending each T £ T n to its cophenetic vector 
<p(T), is inject ive up to isomorphism. As it is well known, this allows to induce 
metrics on T n from metrics defined on powers of R. In particular, in this paper 
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we consider the cophenetic metric cL, 2 on T n induced by the euclidean distance: 



To distinguish it from other cophenetic metrics obtained through other LP 
normes, we shall call it the euclidean cophenetic metric. 

Example 1. Consider the phylogenetic trees T,T' e 7i depicted in Fig. [I] 
Their total cophenetic vectors are 

<p(T) =(2,1,0,0,2,0,0,2,1,2) 
<p(F) = (1,0,0,0,2,1,1,3,2,3) 

and therefore d^^iT, T') 2 — 7. As we shall see below, the expected values of the 
square of d^a on T4 under the uniform and the Yule models are, respectively, 
10.56 and 9.41, and hence these two trees are quite more similar than average 
with respect to the euclidean cophenetic metric under both models. 





Figure 1: Two phylogenetic trees with 4 leaves. 

Let D\ the random variable that chooses a pair of trees T, T' 6 T n and 
computes d^^T, T') 2 . Its expected values under the Yule and the uniform 
models are given by the following two theorems. Recall that the n-th harmonic 
number H n is defined as H n = ^™ =1 

Theorem 2. For every n 2, the expected value of under the Yule model 
is 

2n 

E Y {Dl) = (3n 2 - 10n-l + 8(n+l)H n -4(n + l)H 2 ). 

n — 1 

Theorem 3. For every n 2, the expected value of D\ under the uniform 
model is 

Eu (D n ) = -(in +I8n -lOn) — (^^^n J 

Since H n ~ ln(n) and (2n — 2)!!/(2n — 3)!! ~ yft, these formulas imply that 



E Y (Di)~6n<, Eu (Di)~{---}n\ 

We shall prove the formulas in Theorems [2] and [3] by reducing the computa- 
tion of the expected value of D\ to that of the following random variables: 
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• S n , the random variable that chooses a tree T £ T n and computes its 
Sackin index 5* [53], defined by 



i=l 

• <!>„, the random variable that chooses a tree T G 7^ and computes its total 
cophenetic index $ [T8], defined by 

$(T) = E ^r(i,i) 

(2) 

• <I> n , the random variable that chooses a tree T £ T n and computes 

$ (2) (T)= E M<,i) a 

For the models under consideration, the expected values of these variables are 
related to that of D\ by the next proposition. In it and henceforth, we shall 
denote by E(X) the expected value of a random variable X on T n under a generic 
probability distribution p : T n —¥ [0, 1] on 7~ n invariant under relabelings. The 
probability distributions py and pjj defined by the Yule and the uniform models, 
respectively, are invariant under relabelings, and therefore the expected values 
under these specific models, which will be denoted by Ey and Ejj, respectively, 
are special cases of E. 

Proposition 4. EiDl) = 2E(§ {2) ) - 2 • - 4 • , 

n n(n - 1) 

Proof. To simplify the notations, let 

• (p n be the random variable that chooses a tree T g T n and computes 
pr(l,2). 

• 5 n be the random variable that chooses a tree T £ 7~ n and computes #t(1)- 
Let us compute now E(D^) from its very definition: 

E{D 2 n )= E ^, 2 (T,r') 2 p(T) P (T') 

(T,T')er„ 2 

= E ( E (Mi,3)-<PT>(i,j)) 2 )p(T)p(T') 

(T,T')ST„ 2 l^i^j^n 

= E E (VTiiJ) 2 + ^ T '{i,j) 2 -2tp T (i,j)tp T/ (i,j))p(T)p(T') 
= E( E M*,J) 2 P(TMT')+ E fT'(hj) 2 p{T)p(T') 

l^i^j^n (T,T<)eT£ (T,T')GT„ 2 

-2 E l PT(i,j)fT'(i,j)p(T)p(T') 
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E (J2^,j)Mt)+ E 

VT'(i,j) 2 p(T') 
-2( £ ^(i,i)KT))( E ^'(i.jXT') 



TeT„ T'eT„ 



E ( 2 E MhifpW) -2( E ^t(m)^) 

2 E( E vr(i>j) 2 )pcn-2 e (E^(^>( T ) 

2 E ( E MMM^ 



= 2 E ^'Wr)-2U( E ^(l,2)p(T) 

-2n( E ^(1)P(T)) 2 
TeT„ 

= 2S($i 2) ) - n(n - l)E(p n ) 2 - 2nE{S n ) 2 

Now, the values of E(S n ) and E(tp n ) can be easily obtained from E(S n ) and 
B($ n ), respectively, using the invariance under relabelings of the probability 
distribution under which we compute the expected values E: 

E(S n ) = E(S n )/n, E( Vn ) = £($„)/Q) 

The formula in the statement is then obtained by replacing E(S n ) and E{ip n ) 
by these values. □ 

The expected values of S n and $„ under the Yule and the uniform models 
are known: 

r(2n - 2V 

E Y (S n ) = 2n(H n -l) Eu (S n )=n( ^—^ -l 

E Y ($ n ) = n(n 1) - 2n(H n 1) Eu($ n ) = \ Pj ( |f^|jj - 2 

The formula for £V(SVi) was proved in [Tj)| and the other three, in [T5] . 

To obtain the expected values of D 2 , it remains to compute the expected 

(2) 

values of . They are given by the following result. 
Proposition 5. For every n ^ 2. 

(a) £y($i 2) ) = 5n(n - 1) - 8n(ff n - 1) 



1^2 , ^ 3 , , Q ,(2n-2)! 



f&j = -n(4n 2 + 21n - 7) - '-n{n + 3), ^ 

This proposition is proved in the Appendix at the end of this paper. Finally, 
the identities given in Theorems[2]and|3]are obtained by replacing, in the identity 



G 



given in Proposition [4J E(S n ), and E(Q n ) by their values under the 

Yule and the uniform models, respectively. We leave the last details to the 
reader. 

4. An experiment on TreeBASE 

In this section we report on a very simple experiment to show how d Vy 2 can 
be used to test evolutionary hypotheses. In this experiment, we have compared 
the expected value of d^ 2 011 Tn under the uniform and the Yule models with 
its average value on the set TreeBASE;,^ „ of binary phylogenetic trees with n 
leaves contained in TreeBASE [20] . 

To perform this experiment, we have taken some decisions. First, since there 

are only very few values n > 50 such that |TreeBASEhi„ !n | > 10, we have decided 

to consider only those binary trees contained in TreeBASE with n ^ 50 leaves. 

On the other hand, even for those n such that TreeBASEbi„.„ is relatively large, 

in most cases it does not contain many pairs of trees with the same taxa. So, 

instead of computing the average value of (i 2 , 2 on TreeBASEf,i„ in by averaging 

the values d 2 2 (T, T') for pairs of trees T, T" with exactly the same n taxa, we 

have made use of the formula given in Proposition [4j as if TreeBASEfo„ in was 

closed under relabelings: that is, we have taken only into account the shapes of 

the trees contained in it. This is consistent with the fact that our final goal is 

to test models of evolution that produce tree shapes. 

—(2) 

So, we have computed the average values of $ , of the Sackin index S, and 
of the total cophenetic index $ on TreeBASEf,j„ n , and we have taken as average 
value of 2 on this set the result of appying the formula in Proposition |4] The 
detailed results of these computations, as well as the Python and R scripts used 
to compute and analyze them, are available in the Supplementary Material web 
page http : / /bioinf o .uib . es/~recerca/phylotrees/ expect edcophdist/ . 

Fig. 2 plots the log of these average values as a function of log(n). We have 
added the curves of the log of the expected values of D\ under the Yule distri- 
bution (lower, dotted curve) and under the uniform distribution (upper, dashed 
curve), again as a function of log(n). The graphic shows that the expected 
value of d 2 V 2 on (the shapes of) the phylogenetic trees contained in TreeBASE 
is better explained by the uniform model than by the Yule model. This agrees 
with the results of similar experiments using other measures (see, for instance, 

mm)- 

5. Conclusions and discussion 

In this paper we have obtained formulas for the expected values under the 
Yule and the uniform models of the square of the euclidean cophenetic met- 
ric d Vt 2, defined by the euclidean distance between cophenetic vectors. These 
formulas are explicit and hold on spaces T n of fully resolved phylogenetic trees 
with any number n of leaves. 
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Figure 2: Log-log plots of the mean of D\ for the binary trees in TreeBASE with a fixed 
number n of leaves, of Ey(D^) (dotted curve) and Eu(D^) (dashed curve). 



These formulas have been obtained through long algebraic manipulations of 
sums of sequences. To double-check our results, we have computed the exact 
value of Ey(D^) and Eu(D^) for n = 3, . . . , 7, by generating all trees with up to 
7 leaves. Moreover, we have computed numerical approximations to these values 
for n = 10, 20, . . . , 100, by generating pairs of random trees until the numeri- 
cal method stabilizes. These numerical experiments confirm that our formulas 
give the right figures. Table [T] gives the exact values for n = 3, . . . , 7. The 
results of the simulations for n — 10, 20, . . . , 100, as well as the Python scripts 
used in these computations, are also available in the aforementioned Supple- 



expectedcophdist/ . 



mentary Material web page http://bioinfo.uib.es/~recerca/phylotrees/ 
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4 


5 


6 


7 


Eu(Dl) 


2.66667 
2.66667 


9.40741 
10.56 


21.1833 
26.2367 


38.712 
52.3023 


62.5562 
91.4086 



Table 1: Values of Ey(D*) and Eu{D*) for n 
our formulas. 



, 7. They agree with those given by 



The formulas for Ey(D^) and Eu(D^) grow in different orders: Ey(D^) is 
in 0(n 2 ), while Eu(D^ l ) is in <d(n 3 ). Therefore, they can be used to test the Yule 
and the uniform models as null stochastic models of evolution for collections of 
phylogenetic trees reconstructed by different methods. We have reported on a 
first experiment of this type, which reinforces the conclusion that "real world" 
phylogenetic trees (that is, those contained in TreeBASE) are not consistent 
with the Yule model of evolution. We plan to report in a future paper on more 
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extensive tests on stochastic models of evolutionary processes, including Ford's 
a-model [TT] and Aldous' /3-model [2J. 
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Appendix: Proof of Proposition 5 

Proof of Proposition ^(a) 
For every T £ T n , let 

$(T) = S(T)+*(T)= E ^r(U), 

and let <!>„ be the random variable that chooses a tree T £ T n and computes 
<5(T). We have that 

Ey($ n ) = E Y {S n ) + E Y (<Z> n ) = n(n - 1). 

(2) 

To compute ), we shall use an argument similar to the one used in 

the proof of [SJ Prop. 3]. Notice that 

M*L 2) )= E ® {2 \t)-py{t) 

TeT n 

1 71—1 

= ^E E E E * (2) (rn;U)-^(Trr;_ fe ) 

fc=i s»g{i,...,» } r fc er(s 4 )r^_ fc er(s=) 

Now, on the one hand, we have the following easy lemma on Py(T^T'): see [7, 
Lem. 1]. 

Lemma 6. Let ^ S k £ {1, . ■ • , n} with \S k \ = k, let T k £ T{S k ) and T' n _ k £ 
T{S c k ). Then, 

PY{TCT' n _ k ) = 2 P (T k )P{T n _ k ). 

(2) 

On the other hand, we have the following recursive expression for $ (T^T'). 

Lemma 7. Let ^ S k C {1, . . . ,n} with \S k \ = k, let T k £ T(S k ) and T' n _ k £ 
T{S c k ). Then 

^ {2) (T k -TU) = 5 (2) (T fc )+$ (2) (7;;_ fe )+2$(T fe )+2$(T ) ;_ fe ) + 

Proof. Let us assume, without any loss of generality, that S = {1, . . . , m} and 
S' = {m + l,...,n}. Then 

! l fiT k (hj) + l ifl<i,j<fc 
fT' n _ k {i,i) + 1 if fc + 1 < i, j < n 
otherwise 



(k + l\ fn-k 



ii 



and therefore 

d 2 ) 



= E (^T fc (*,i) 2 + 2^ Tfc (^ .?) + !)+ E (^_ Jb (*.i) 2 + 2 ^_ fc (*.J') + 1 ) 

□ 

So, if we set /(a, 6) = + ( 6 + 1 ), we have that 

= \T.( n k ) E E [$ (2) (^) + $ (2) (^) + 2($(T fe ) + $(T 1 ;_ fc )) 

fc=i V / T k eT k T' n _ k eT n - k 
+f(k,n- k)] ^—L-^P Y (T k )P Y {T' n _ k ) 

1 n— 1 

— E[EE * ( V*WT fe )iv(T,u) 

fc=l T fe T' n _ k 

+ E E 5 (2) (i;U)^(T fe )Py(r;_ fe ) 

+ 2 E E 5(T fc )P y (T fc )P y (7;U) 
+ E E f{k,n-k)P Y {T k )P Y (T' n _ k ) 

= n^r E [£* ( V*)iv(T fc ) + E * {2 \tu)Py(tu) 

+2^{T k )Py{T k )+2 E $(3^_ fc )iV(n-fc) + 

1 n— 1 

= E M^f) + *V(*i-k) + 2E Y ($ k ) + 2E Y ($ n _ k ) 

/fc + l\ /n-fc + l\l 

+ ( 2 M » )l 

9 11 1 /I ™ — 1 i 

= — x E ^ 2) ) + £ + 3 n ( n + 

fe=l fc=l 
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In particular 

n-2 . n-2 



and therefore 



n-2 ^ v ft ' n-2 

k=l k = l 



n—2 

Ey(tf>) = ^| • E ^) + A^(^t) 

n — 1 n — 2 * — ' n — 1 

k=l 

+ ■ — £ ^(**) + -^r^(* n -l) 

n — 1 n — 2 n — 1 

fe=i 

n- 2 1 . 

H • -nvn - 1) + n 

n — 1 3 

n — 1 n — 1 n — 1 

= -I?-£V(*Si) + 5n-8. 
n — 1 



(2) 

Setting x n = Ey{& n )/n, this recurrence becomes 



•En •E'n—l ~t~ ^ 

n 

(2) 

and the solution of this recursive equation with x\ = £'y($ 1 ) = is 

n „ 

i„ = E ( 5 " r) = 5 (" - J ) " 8 (^« - 1) = 5n + 3 - 8#„ 



fc=2 



-(2) 

from where we deduce that E Y ($ n ) = 5n 2 + 3n - 8nH n , as we claimed. 
Proof of Proposition [5[ (b) 

(2) 

To compute Eu(<& n ), we shall use an argument similar to the one used in 
[T7] . For every k = 1, . . . , n — 1, let 

/*,« =|{rGT»|Ml.2) = *}| 

= |{T G 7^ | <pr{i,j) = k}\ for every 1 < i < j ^ n 
d k .n = \{T G T n | *r(l) = k}\ 

= \{T eT n \ S T (i) = fc}| for every 1 < i ^ n 

(where \X\ denotes the cardinal of the set X). 
Lemma 8. For every n ^ 2, 

n— 1 / \ n—2 

= T^yn (» E fc2 • +(;)£**• 

V ; " fc=l V / k=l 
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Proof. Under the uniform model, 



where 



Eu{ ^ ] (2n-3)!! : 



E* ( V)=E E Mu) a = E E^(m-) 2 
= E E^« 2 + E E Mu) a 

n-1 

= E E fc2 -i{ Ter «i^« = A: }i 

l^isjn fe=l 



n— 1 n— 2 

E E fc2 -^«+ E X> 2 -/m 

^i^n fc— 1 l^i<j^n k— 1 

n— 1 / \ n— 2 

- E ^ ' rffe <" + ( 2 ) E ^ ' f k ' n - 
k=l V / fc=l 



□ 



A formula for dk, n was obtained in the proof of [T51 Lem. 21] 



(2n-k- 3)! • k 
(n-k- l)\2 r 



d k,n - JZ T. ^.n n -n- C 1 ) 



As far as fk, n goes, we have the following result. In it, and henceforth, p F q 
denotes the (generalized) hypergeometric function defined by 

p ( at, . . . , a p \ (ai) fc ■ ■ ■ (ap) fc z** 

p 9 v &i, \ ' )-fe (h) k ---(b q ) k ■ kV 

where (a)o = 1 and (a)fc := a ■ (a + 1) ■ ■ ■ (a + ife — 1) for fe ^ 1. 
Lemma 9. For every n ^ 2, fo >n = (2n — 4)!! and 

_ (2ra- fc-5)!fc / 1, 2-n, fc + 2-n 
/fe ' n_ (2n-2fc-4)!! ' 3 2 V ^ - | - n + 3 ' 1 

/or every fc = l,...,n — 2. 

Proof. Let us start by proving /o. rl = (2n — 4)!! by induction on n. It is clear 
that /o, 2 = 1 = (2 • 2 - 4)!!. Assume now that f , n -i = (2(n - 1) - 4)!!. 
Every phylogenetic tree T with n leaves such that y>r(l, 2) = 0, that is, where 
LCAt(1, 2) is the root, is obtained by taking a phylogenetic tree T' with n — 1 
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leaves such that <£>t'(1, 2) = and adding a new pendant edge, ending in the leaf 
n, to any edge in T' . Then, since there are /o,n-i = (2rc — 6)!! trees T' G T n -\ 
such that y>T'(l, 2) = 0, and each one of them has 2(n — 1) — 2 edges where we 
can add the new edge, we obtain 

/o,„ = (2n-4)(2n-6)!! = (2n-4)!!. 

Now, to compute for k ^ 1, we shall study the structure of a tree T £ 7~ n 
such that <£>t(1, 2) = fc; to simplify the notations, let us denote by x the node 
LCAt(1, 2), which has depth k, and by To the subtree of T rooted at x. 

Then, on the one hand, To is a phylogenetic tree on a subset So Q {1, ■ . ■ ,n} 
containing 1,2, and since its root x is the LCA of 1 and 2 in T, we have that 
</?r (l, 2) = 0. On the other hand, there is a path (r = Ui,i>2, «3) • • • , Vk+i = x) 
in T from r to x. For every j = 1, . . . , k, let Tj be the subtree rooted at the 
child of Vj other than Vj+x] see Fig. [3] 

So, the tree T is determined by: 

• A number O^m^ri — fc — 2, so that m + 2 will be the number of leaves 
of the phylogenetic tree T rooted at LCAt(1, 2) 

• A subset {ii, . . . ,i m } of {3, . . . , n}. There are (™~ 2 ) such subsets. 

• A phylogenetic tree T on {1,2,^,..., i m } such that ip To (l, 2) = 0. There 
are /o, m +2 = (2m)!! such trees. 

• An ordered k-forest, that is, an ordered sequence of phylogenetic trees 
(Ti, T u . . . , T k ) such that (J* =1 L(T,) = {1, . . . , n} - {1, 2, i l5 . . . , i m }. The 
number of such ordered fc-forests is (see, for instance, (TTJ Lem. 1]) 

(2n-2m-k- 5)!fc 
(n - m - k - 2)!2™- m - fc ~ 2 ' 




Figure 3: The structure of a tree T with v?t(1, 2) = k. 
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This shows that fk, n can be computed as 

n-k-2 

fk.n = ^ (number of ways of choosing ... ,i m }) 

m=o -(number of trees in T m +2 with <£t(1, 2) = 0) 

• (number of ordered fc- forests on n — m — 2 leaves) 

" ' 2/ n-2\ (2n-2m- fc-5)!fc 



l-k-2 / \ 

e )•(*»)» 



(n - m - fe - 2)!2™- m - fe - 2 

_ fc "y^ 2 (n-2)!m!2 m (277 - 2m-fc-5)! 

m K n - m - 2 V-(n - m - k - 2)\2 n - m - k - 2 

(n-2)!fc 2 4 m (2n-2m-fc-5)! 
- 2 n-fc-2 A/ ( n _ m _2)!(n-m-fc-2)! 

Now, taking into account that 
(l) m = m! 

(2-n) =(-ir (n ~ 2)! 
1 jm ~ l j (n-m-2)! 

(fe+2-n) m = (-ir (w : fc " 2) L 

(71 — fc — 772 — zj! 

A: + 5 \ (-l) m (2n-fc-5)!! 



2 ) m 2 m (2n-fc- 2777 - 5)!!' 
fc ^ \ _ (-l) m (2n-fc-6)H 



2 / m 2 m (2n-fc- 2777 - 6)!! 



we have that 



/ 1, 2-n, k + 2-n \ ^ (l) m ■ (2 - n) m ■ (fc + 2 - n) m J_ 
3 H *±«-n. *-n + 3 ' J A. ( fe±5_ n)m . ( |_ n + 3)m - m! 



"> 2 



m>0 



^-v m\(n - 2)!(t7 - fc - 2)!2" 1 (2t7 - fc - 2m - 5)!!2 m (2n - fc - 2m - 6)!! 



m>0 
n-fc-2 



= E 



(77 - to - 2)!(t7 - fc - 777 - 2)!(2t7 - fc - 5)!!(2t7 - fc - 6)!!m! 
(77 - 2)!(n - fc - 2)!(2t7 - fc - 2m - 5)!2 2m 



(| (77-m-2)!(n-fc-m-2)!(2n-fc-5)! 
(77-2)!(77-fc-2)!™^ 2 (2n-fc-2m-5)!4 



(2n -fc- 5)! ^ (77-m-2)!(77-fc-m-2)! 



from where we deduce that 

n-k-2 



E 



(277 -fc-2m-5)!4 Ii 



' (77-m-2)!(77-fc-m-2)! 

(2n -fc- 5)! / 1, 2-77, fc + 2-77 N 

~ (77-2)!(77-fc-2)! 3 2 V^-77, f-77 + 3'y 
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and hence 



_ (n - 2)\k 2 4" l (2n - 2m - Jfe - 5)! 



m=0 



(n — m — 2)!(n — m — k — 2)! 



(n-2)!& (2ra-fc-5)! / 1, 2 - n, fc + 2 - n 

—Z7. 7. ST" ' "7 ^VT^ ; ^VT3^2 fc-LK b . „ I 1 



2™- fc - 2 (n-2)!(n-fc-2)! 3 2 \ *±I - „, |- n +3 
(2n-fc-5)!fc / 1, 2-n, + 2 - n 



(2n-2*-4)!l V ^p-n, |-n + 3 
as we claimed. □ 
We must compute now the sums 

n — 1 n—2 



5 ' fc ' dk.m ^ ' • fk,n- 



k=l k=l 

To do that, we shall use the following auxiliary lemma. 
Lemma 10. For every n ^ 2 and to 1, Zet 

_ fc m (n + fc-2)! 

fc=0 

Then, 

U n>0 =(2n-4)!! 

[/ n> i = (n- l)(2n-4)!! - (2n- 3)!! 

[4,2 = (" 2 - l)(2n - 4)!! - (2n - l)(2n - 3)!! 

[7 n ' 3 = (n 3 + 3n 2 - 3n - l)(2n - 4)!! - (3n 2 + n - l)(2n - 3)!! 

Proof. The proof of these identities is standard, using well known equalities for 
hypergeometric functions and the lookup algorithm given in |21l p. 36]. We 
shall prove in detail the identity for m = 2, and we leave the details of the rest 
to the reader. 
Notice that 

_ ^ fc 2 (n + fc-2)! _ ^ k 2 {n + k-2)\ _ ^ 3 (fc + l) 2 (n + k - 1)1 
n - 2 ~ ^ k\2 k ~ ^ k\2 k ~ ^ (fc + l)!2 fe + 1 

fc=0 fc=l k=0 y ' 

y> (fc + l) 2 (n + fc-l)! y, (fc + l) 2 (n + fc - 1)! 
-2_, (fc + l)!2^ ^ p)P 

Set 

(fc + l) 2 (n + fc-l)! y _ V (*= + 1) 2 (" + fc — 1)» 
n ~2-< (fc + l)!2 fc + 1 ' n ~ (fc + l)!2 fc + 1 

We compute now these two summands. 
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As to X n , 



(n-1)! y> (fc + l) 2 (n + fc-l)! 
2 (n-l)!(fc + l)!2 fe 



If we set 



we have that 



(fc + l) 2 (n + fc-l)! 
(n- l)!(fc+ l)!2 fc ' 



tk+l _ (fc + 2)(fc + 7l) 1 

~ (fc+1) 2 2 

and therefore, by the lookup algorithm [21\ p. 36], we have that 



_ (n-l)! 

A n — • 2-Tl 



2 

(n-1)! 



• 2™ • 2 Fi 



= (n-l)!2"" 1 ^ 



2, n 1' 
1 '2, 

_1 ; -1J (using (15.3.4) in [H P- 559]) 
(n) fc (-l) fc (-l) fe 



fc>0 



, lV on-i/ Wo(-l)o (-1)° , (n)i(-l)i (-l)S 
" ( } V (l) ' 0! + (l) x ' 1! ) 
= (n- l)!2"- 1 (n+ 1) 



As to K 



(jfc + n- l) 2 (2n + fc-3)! 
« (fc + n-l)!2 fe +"- 1 



k=0 



(n- l) 2 (2 n-3)! y, (jfe + n - l) 2 (2n + k - 3)! 



If we take now 



we have that 



(„_i)!2»-i ^ (fc + n _ 1)!2fe . 

(fe + n- l) 2 (2n + fc-3)! 



(fc + n-l)!2 fc - ( "~ffig,~ 3)! 



i fc +l _ (n + fc)(2n + fc-2) 1 

2 



t fc (£; + n-l) 2 

and therefore, again by the lookup algorithm [211 p. 36], we have that 



(n-l) 2 (2n-3)! 

r « — 1 1 Mr,„_l ' 3-^2 



(n- 1)!2»" 1 
(n- l) 2 (2n-3)! 

{n-l)\2 n ~ l 



1, n, 2n - 2 . 1 
7i — 1 , n — 1 '2 

2n - 2, 1 .1 

ri — l '2 



71-1 



2 -Pi 



2n -1,2. 

71 

(using 
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Now 



2n - 2, 1 . 1 

n - 1 '2 



2-2-Fi 



1 

n-1 



(using (15.3.4) in [TJ p. 559]) 



2 2 ("~ 2 >r(n- 1) |T(to- 1) i r(n) , 2T(n-§) 

r(2n - 2) 
2 2(n-2)( n _ 2 )! 



2-Fi 



: 2 • 

2 ' (2ra-3)! 
2 n " 1 (n-l)! 

: (2n-3)!! + 

2n- 1, 2 . 1 

n ' 2 

r(n) 



r(o) r(i) r(|) 

(2n-3)!h 



(using [13] ) 



(n-l)! + 2- 



Ira— 1 



2 2 • 2 Fx 



2, 1 - n 
n 



1 (using (15.3.4) in Q] P- 559]) 



= 4- 



r(n 



r(n+|) 



r(f 



2 2(2-n) r ( 2n _ i) v r(|) 

2 2 "- 2 (n- 1)! f(2n-3)!! , (2n-l)!! 

(2n-2)! 
2"- 1 (n - 1)! 

(2n- 2)! 



2T(n)J (using [13]) 



( (.n-3)U (2n-ljU x 
((2n - 3)!! + (2n - 1)!! + 2™ • (n - 1)!) 



Therefore, 



Y n = 



(n- l) 2 (2n-3)! 

(n- 1)!2»- 1 
1 



2™- 1 («- I)' 
(2n-3)!! 
2™- 1 (n- 1)! 



n-1 (2n-2)! 
2"- 2 (n + l)(n-l)! + (2n-l)!! 



((2n - 3)!! + (2n - 1)!! + 2" ■ (n - 1)!) 



and finally 

U n , 2 = -X» - = 2™- 2 (n + l)(n - 1)! - (2n - 1)!! 
= (n 2 - l)(2n - 4)!! - (2n - l)(2n - 3)!! 

as we claimed. 

Lemma 11. For every n ^ 2, 

n-1 

k 2 d k , n = (4n - l)(2n - 3)!! - 3(2n - 2)!!. 



□ 



k=l 



Proof. By equation Q, 



n-l 



} J k 2 d ktn = 22 



fc 3 (2n-fc-3)! 



n-1 



} — = T 

fe-i 



(n-fc-l) 3 (n + fc-2)! 



, x (n - fc - 1)!2" 

= (n - l) 3 ~C/„,o - 3(n - l) 2 C/„,i + 3(n - l)C/„, 2 - C/„, 3 
= (n - l) 3 (2n - 4)!! - 3(n - l) 2 ((n - l)(2n - 4)!! - (2n - 3)!!) 

+3(n - l)((?i 2 - l)(2n - 4)!! - [2n - l)(2n - 3)!!) 

-((n 3 + 3n 2 - 3n - l)(2n - 4)!! - (3n 2 +n - l){2n - 3)!!) 
= {An - l)(2n - 3)!! - 3(2n - 2)(2n - 4)!!. 



□ 
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Lemma 12. For every n ^ 2 

n-2 

k 2 f,, _ = 

3 



n— 2 , „ 

£ fc2 A,n = x(4n + l)(2n - 3)!! - -(2n - 2)!!. 



fc=i 



Proof. To simplify the notations, set = k 2 fk,n- As we have seen in the 
proof of Lemma [9j 



k=l 



_{n- 2)!fc " A 2 4 m (2n - 2m - fc - 5)! 



(n — m — 2)!(n — m — k — 2)! 



and therefore 

n-2 n-k-2 



fc 3 VV 4"(2n-fc-2m-5)! 
2"- 2 ^ ^ (n-fc-2)!(n-fe-m-2)! 

fc=l m=0 v ' y ' 

= (n-2)! ^ 3 n ^ 2 4»- fc - 2 -'(fc + 2m-l)! 
2 n-2 (fc + m)!m! 

fc=l m=0 v ; 

n — 2 , q / n— k — 2 



(n - 2). 2 «- y ^ f i + v j_A+ 2m -A] 

^ 2 fc I fe ^ 4 m mV fe + m // 

fc=l \ m=l V / / 

_ / „2 + 2 ^ fc3 «-^ 2 x /fc + 2m - 1 V 
(n-2)!2"- 2 U-^^ + E^ £ fc J 

\ k=\ m=l \ / , 



Set now 



_ ^ fc3 «^-2 ^ / fc + 2m - A _ ^ fc 3 n ^ 2 1 /fc + 2m-l 
» _ 2^ 2 fc ^ 4 m m I k + m / 2^ 2 fe ^ 4 m m I fc + m 

k— 1 m— 1 \ / m—1 \ 

Since £3 = 0, we have that 

p=3 



and 



9 , , = ( P -2f ^ 3 fc 3 (2p-k-3 

p+i p 2 p ^ 2 fc (p-fc- l)4P- fc - 1 ^ p-1 

_ (p - 2) 3 1 ^ fc 3 (2p-fc-3)! 

~ 2~p + 2 2 p- 2 2- fc O - fe - l)(p - l)!(p - fc - 2) 

_ (p - 2) 3 1 ^ fc 3 (2p-fc-3)! 

_ (p-2) 3 1 ^ (p-fc-2) 3 (p + fc-l)! 

~Yp + 2 2 P-2(p- l)! 2^ 2 fc -P+ 2 (fc + l)! 
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(p-2) ; 

(p-2) ; 
2p 



2f- 1 (p- 1): 



P-2 

E 

fc=2 



1 r 

Hp- 1)! 



(p-fc-l) 3 (p + fc-2)! 
(p-/s-l) 3 (p + /c-2)! 



2P-!(p- 1) 



fc=0 

1 



2 fe /c! 



>-l) 3 (p-2)!-^-2) 3 (p-l)! 
(p-l) 2 1 ^ (p-fc- l) 3 (p + fc-2)! 



2 p-i 2P~ 1 (p-l) 



(P-l) 2 
2 p-i 



E 

fe=0 



2 fc fc! 



(2p - 2) 



-((4p - l)(2p - 3)!! - 3(2p - 2)!!) (by Lemma [TT) 



2p 

Therefore 



(2p-2)!! 



«-E(<*-dS-^-») 



p=3 



Now, applying Gosper's algorithm [211 p. 77] we have that 



p=3 

and then 



(2p-2)!! 3-2 2 " 



— 32(^-3n-l) 



2n - 3 
n- 1 



39 • 2^ 



5' 



L+r(32(4n 2 -3n-l) 



3 • 2 2 «+! 



11 • 2" - 8(n 2 + 2) 



2n- 3 
n — 1 



39-2 



2;i 



2 n+l 

3(n + l) + 



- 3(n-3) 
(4n + l)(2n-3)!! 
3(2n-4)!! ' 



Finally, 



S n =( n -2)!2"- 2 l6-^ + 5; 



„_ 2 , (4n + l)(2n-3)!l 



-3(n- 1)!2™- 2 + 

1 3 ° 

-(in + l)(2n - 3)!! - -(2n - 2)!!. 



□ 
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Summarizing, by Lemmas |8j and 12 wc have that 

'» = pdnp (" g k " + (2) g * 2 ■ 



(2n-3)!! 



n((4n - l)(2n - 3)!! - 3(2n - 2)!!) 

n \ / 1 3 
' 1 -(4n + l)(2n-3)H- -(2n-2)!l 
3 2 

1,. 2 n , 3n(n + 3) (2n - 2)!! 

-n(4n 2 + 21n - 7) ■ 7 '— 

6 V 7 4 2n-3 



as we claimed. 
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